Clustering Item Data Sets with Association-Taxonomy Similarity
نویسندگان
چکیده
We explore in this paper the efficient clustering of item data. Different from those of the traditional data, the features of item data are known to be of high dimensionality and sparsity. In view of the features of item data, we devise in this paper a novel measurement, called the associationtaxonomy similarity, and utilize this measurement to perform the clustering. With this association-taxonomy similarity measurement, we develop an efficient clustering algorithm, called algorithm AT (standing for AssociationTaxonomy), for item data. Two validation indexes based on association and taxonomy properties are also devised to assess the quality of clustering for item data. As validated by the real dataset, it is shown by our experimental results that algorithm AT devised in this paper significantly outperforms the prior works in the clustering quality as measured by the validation indexes, indicating the usefulness of association-taxonomy similarity in item data clustering.
منابع مشابه
A Generic Query-Based Model for Scalable Clustering
This paper presents a generic model for clustering that requires no direct knowledge of the nature or representation of the data. In lieu of such knowledge, the relevant-set clustering (RSC) model relies solely on the existence of an oracle that accepts a query in the form of a data item, and returns a ranked set of items relevant to the query. In principle, the role of the oracle could be play...
متن کاملClustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers
In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...
متن کاملNew distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کاملClustering Method Study on High-Dimensional Trading Data
Existing clustering algorithms are not designed specially for the features of trading data s and most clustering analyses lack scalability for large-scale transactions. Therefore, a rapid and scalable clustering algorithm using little space is proposed by us, to effectively process high-dimensional trading data without setting parameters manually. The improved method introduces weighted coverag...
متن کاملFuzzy Clustering improves Phylogenetic Relationships Reconstruction from Metabolic Pathways
The interest in reconstructing phylogenetic relationships from data on structural similarity of metabolic pathways is growing. The similarity notions and the techniques involved in this reconstruction are assessed by building phylogenetic relationships for model sets of organisms from the similarity measures of the same metabolic pathway for all of them, and then the phylogenetic trees obtained...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003